original sentence
- North America > Puerto Rico (0.05)
- North America > Mexico > Colima (0.05)
- Asia > Japan (0.05)
- North America > United States > New Jersey (0.04)
- Leisure & Entertainment (0.94)
- Media > Film (0.47)
For the First Time, AI Analyzes Language as Well as a Human Expert
If language is what makes us human, what does it mean now that large language models have gained "metalinguistic" abilities? Among the myriad abilities that humans possess, which ones are uniquely human? Language has been a top candidate at least since Aristotle, who wrote that humanity was "the animal that has language." Even as large language models such as ChatGPT superficially replicate ordinary speech, researchers want to know if there are specific aspects of human language that simply have no parallels in the communication systems of other animals or artificially intelligent devices. In particular, researchers have been exploring the extent to which language models can reason about language itself.
- North America > United States > California > Alameda County > Berkeley (0.05)
- Europe > Slovakia (0.04)
- Europe > Czechia (0.04)
- Asia > China (0.04)
LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval
Lei, Wenchang, Zou, Ping, Wang, Yue, Sun, Feng, Zhao, Lei
Large language models (LLMs) exhibit strong semantic understanding, yet struggle when user instructions involve ambiguous or conceptually misaligned terms. We propose the Language Graph Model (LGM) to enhance conceptual clarity by extracting meta-relations-inheritance, alias, and composition-from natural language. The model further employs a reflection mechanism to validate these meta-relations. Leveraging a Concept Iterative Retrieval Algorithm, these relations and related descriptions are dynamically supplied to the LLM, improving its ability to interpret concepts and generate accurate responses. Unlike conventional Retrieval-Augmented Generation (RAG) approaches that rely on extended context windows, our method enables large language models to process texts of any length without the need for truncation. Experiments on standard benchmarks demonstrate that the LGM consistently outperforms existing RAG baselines.
- North America > United States (0.14)
- Europe > France (0.04)
- Asia > China > Hunan Province (0.04)
- (5 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Consumer Health (1.00)
- Education > Health & Safety > School Nutrition (1.00)
- (2 more...)
DETECT: Determining Ease and Textual Clarity of German Text Simplifications
Korobeynikova, Maria, Battisti, Alessia, Fischer, Lukas, Gao, Yingqiang
Current evaluation of German automatic text simplification (ATS) relies on general-purpose metrics such as SARI, BLEU, and BERTScore, which insufficiently capture simplification quality in terms of simplicity, meaning preservation, and fluency. While specialized metrics like LENS have been developed for English, corresponding efforts for German have lagged behind due to the absence of human-annotated corpora. To close this gap, we introduce DETECT, the first German-specific metric that holistically evaluates ATS quality across all three dimensions of simplicity, meaning preservation, and fluency, and is trained entirely on synthetic large language model (LLM) responses. Our approach adapts the LENS framework to German and extends it with (i) a pipeline for generating synthetic quality scores via LLMs, enabling dataset creation without human annotation, and (ii) an LLM-based refinement step for aligning grading criteria with simplification requirements. To the best of our knowledge, we also construct the largest German human evaluation dataset for text simplification to validate our metric directly. Experimental results show that DETECT achieves substantially higher correlations with human judgments than widely used ATS metrics, with particularly strong gains in meaning preservation and fluency. Beyond ATS, our findings highlight both the potential and the limitations of LLMs for automatic evaluation and provide transferable guidelines for general language accessibility tasks.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Austria (0.04)
- Asia > India (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
- Education (0.67)
Modeling the language cortex with form-independent and enriched representations of sentence meaning reveals remarkable semantic abstractness
Saha, Shreya, Li, Shurui, Tuckute, Greta, Li, Yuanning, Zhang, Ru-Yuan, Wehbe, Leila, Fedorenko, Evelina, Khosla, Meenakshi
The human language system represents both linguistic forms and meanings, but the abstractness of the meaning representations remains debated. Here, we searched for abstract representations of meaning in the language cortex by modeling neural responses to sentences using representations from vision and language models. When we generate images corresponding to sentences and extract vision model embeddings, we find that aggregating across multiple generated images yields increasingly accurate predictions of language cortex responses--sometimes rivaling large language models. Similarly, averaging embeddings across multiple paraphrases of a sentence improves prediction accuracy compared to any single paraphrase. Enriching paraphrases with contextual details that may be implicit (e.g., augmenting "I had a pancake" to include details like "maple syrup") further increases prediction accuracy, even surpassing predictions based on the embedding of the original sentence, suggesting that the language system maintains richer and broader semantic representations than language models. Together, these results demonstrate the existence of highly abstract, form-independent meaning representations within the language cortex.
- North America > United States > Massachusetts (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
A Beam Search Algorithm
Algorithm 1 demonstrates the step-by-step operations of our beam search algorithm (see Sec. 4.3). We consider recovering sentences in the current work. We leave recovering longer paragraphs as future work. We keep 2000 examples of each dataset as the evaluation set, and use the left for training. "End-to-End optimization", "Reg" means the inclusion of a regularization term, "DR" refers to a discrete token Our approach is unique as it does not rely on end-to-end optimization, is demonstrated on large batch sizes (i.e.
- North America > Puerto Rico (0.05)
- North America > Mexico > Colima (0.05)
- Asia > Japan (0.05)
- North America > United States > New Jersey (0.04)
- Leisure & Entertainment (0.94)
- Media > Film (0.47)
What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction
Wyatt, Charlie, Joshi, Aditya, Salim, Flora
Transformer-based models primarily rely on Next Token Prediction (NTP), which predicts the next token in a sequence based on the preceding context. However, NTP's focus on single-token prediction often limits a model's ability to plan ahead or maintain long-range coherence, raising questions about how well LLMs can predict longer contexts, such as full sentences within structured documents. While NTP encourages local fluency, it provides no explicit incentive to ensure global coherence across sentence boundaries--an essential skill for reconstructive or discursive tasks. To investigate this, we evaluate three commercial LLMs (GPT -4o, Claude 3.5 Sonnet, and Gemini 2.0 Flash) on Masked Sentence Prediction (MSP) -- the task of infilling a randomly removed sentence -- from three domains: ROCStories (narrative), Recipe1M (procedural), and Wikipedia (expository). We assess both fidelity (similarity to the original sentence) and cohesiveness (fit within the surrounding context). Our key finding reveals that commercial LLMs, despite their superlative performance in other tasks, are poor at predicting masked sentences in low-structured domains, highlighting a gap in current model capabilities.
What's Taboo for You? - An Empirical Evaluation of LLMs Behavior Toward Sensitive Content
Ferrara, Alfio, Picascia, Sergio, Pinnavaia, Laura, Ranitovic, Vojimir, Rocchetti, Elisabetta, Tuveri, Alice
Proprietary Large Language Models (LLMs) have shown tendencies toward politeness, formality, and implicit content moderation. While previous research has primarily focused on explicitly training models to moderate and detoxify sensitive content, there has been limited exploration of whether LLMs implicitly sanitize language without explicit instructions. This study empirically analyzes the implicit moderation behavior of GPT-4o-mini when paraphrasing sensitive content and evaluates the extent of sensitivity shifts. Our experiments indicate that GPT-4o-mini systematically moderates content toward less sensitive classes, with substantial reductions in derogatory and taboo language. Also, we evaluate the zero-shot capabilities of LLMs in classifying sentence sensitivity, comparing their performances against traditional methods.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Lombardy > Milan (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (3 more...)